Cirscan: a shiny application to identify differentially active sponge mechanisms and visualize circRNA–miRNA–mRNA networks

Background Non-coding RNAs represent a large part of the human transcriptome and have been shown to play an important role in disease such as cancer. However, their biological functions are still incompletely understood. Among non-coding RNAs, circular RNAs (circRNAs) have recently been identified for their microRNA (miRNA) sponge function which allows them to modulate the expression of miRNA target genes by taking on the role of competitive endogenous RNAs (ce-circRNAs). Today, most computational tools are not adapted to the search for ce-circRNAs or have not been developed for the search for ce-circRNAs from user’s transcriptomic data. Results In this study, we present Cirscan (CIRcular RNA Sponge CANdidates), an interactive Shiny application that automatically infers circRNA–miRNA–mRNA networks from human multi-level transcript expression data from two biological conditions (e.g. tumor versus normal conditions in the case of cancer study) in order to identify on a large scale, potential sponge mechanisms active in a specific condition. Cirscan ranks each circRNA–miRNA–mRNA subnetwork according to a sponge score that integrates multiple criteria based on interaction reliability and expression level. Finally, the top ranked sponge mechanisms can be visualized as networks and an enrichment analysis is performed to help its biological interpretation. We showed on two real case studies that Cirscan is capable of retrieving sponge mechanisms previously described, as well as identifying potential novel circRNA sponge candidates. Conclusions Cirscan can be considered as a companion tool for biologists, facilitating their ability to prioritize sponge mechanisms for experimental validations and identifying potential therapeutic targets. Cirscan is implemented in R, released under the license GPL-3 and accessible on GitLab (https://gitlab.com/geobioinfo/cirscan_Rshiny). The scripts used in this paper are also provided on Gitlab (https://gitlab.com/geobioinfo/cirscan_paper). Supplementary Information The online version contains supplementary material available at 10.1186/s12859-024-05668-y.


In-house database of miRNA-target interactions based on predicted and experimentally validated interactions
A miRNA-target interaction is established between a region of the miRNA, called "seed sequence" and a complementarity sequence called "miRNA Recognition Element" (MRE) on the target (circRNA or mRNA).An in-house database of interactions between miRNAs and putative mRNA or circRNA targets was constructed using the TargetScan interaction prediction tool (McGeary et al. 2019) and information from experimentally validated interaction from the ENCORI database (Li et al. 2014).
TargetScan was used to predict miRNA-circRNA and miRNA-mRNA interactions, and to obtain an affinity score for each interaction, by taking into account the specificity of the RNA type considered (mRNA or circRNA).For miRNA-circRNA interactions, we used 140,790 circRNA sequences from circBase (Glažar, Papavasileiou, and Rajewsky 2014) and 9,994 miRNA sequences from TargetScan (Garcia et al. 2011).TargetScan v6 calculates for each miRNA-circRNA interaction an affinity score called context+ score, corresponding to the sum of the contribution of six features (site-type, 3'compensatory pairing, local AU, position, target site abundance and seed-pairing stability).The feature 3'-compensatory pairing was removed here because it is not sufficiently justifiable at present, as the specific mechanisms of circRNA-miRNA pairing remain poorly described in the literature.The lower the context+ score, the stronger the affinity between the miRNA and its target.To reduce the number of false positive predictions, we defined a cutoff on the context+ score, based on its distribution on experimentally validated interactions given by the ENCORI database.We defined as a cutoff the 95th percentile of the context+ score distribution restricted to the interactions supported by at least two CLIP-seq and two degradome-seq experiments (Figure S1).We used the ENCORI/Starbase API [http://starbase.sysu.edu.cn/tutorialAPI.php] to download the experimentally validated interactions contained in this database.
We retrieved 2,055,407 miRNA-circRNA experimentally validated interactions: c u r l ' h t t p s : / / s t a r b a s e .s y s u .edu .cn / a p i /miRNATarget /? assembly=hg19& geneType=circRNA&miRNA=a l l&clipExpNum=0&degraExpNum=0&pancancerNum=0& programNum=1&program=None&t a r g e t=a l l&c e l l T y p e=a l l ' and 1,215,274 miRNA-mRNA experimentally validated interactions: c u r l ' h t t p s : / / s t a r b a s e .s y s u .edu .cn / a p i /miRNATarget /? assembly=hg19& geneType=mRNA&miRNA=a l l&clipExpNum=0&degraExpNum=0&pancancerNum=0& programNum=1&program=None&t a r g e t=a l l&c e l l T y p e=a l l ' For miRNA-mRNA interactions, we used 2,382,569 UTR sequences and 9,994 miRNA sequences from TargetScan (Garcia et al. 2011).TargetScan v8 calculates for each miRNA-mRNA interaction an affinity score called context++ score (Agarwal et al. 2015).Compared to the context+ score, the context++ score includes additional criteria specifically relevant for miRNA-mRNA interactions (e.g.3' UTR length, ORF length, probability of conserved targeting between species), which are expected to reduce false positive predictions.
2 Cirscan performace evaluation 2.1 Pre-processing of colorectal cancer data Microarray multi-level transcript expression data of 10 colorectal cancer (CRC) samples and 10 normal adjacent samples were downloaded from the NCBI Gene Expression Omnibus database (accession number: GSE126095).Each dataset was imported into the RStudio (v1.4.1103) environment with R (v4.2.2) for pre-processing.
For transcripts belonging to the gene annotation, an expression average was applied.Quantile normalization and a log-transformation were applied to the mRNAs expression matrix.mRNAs microarray matrix was reduced to protein-coding genes, using the annotation file provided by the authors.Using the limma R package (v.3.50.1) (Ritchie et al. 2015), 4,640 differentially expressed mRNAs with an adjusted p-value (Benjamini-Hochberg, BH) < 0.05 were selected.The circRNAs microarray matrix was also submitted to quantile normalization and a log-transformation.We selected 1,491 differentially expressed circRNAs with an adjusted p-value (BH) < 0.05 by using the limma R package (v.3.50.1) (Ritchie et al. 2015).Finally, the miRNAs microarray matrix was filtered to only include miRNA identifiers present in the annotation file, i.e. 2,055 miRNAs.Quantile normalization and log-transformation were also applied to the miRNAs expression matrix.These different preprocessed expression matrices were given as input to Cirscan.

Pre-processing of hepatocarcinoma data
circRNAs microarray expression data of 7 hepatocellular carcinoma (HCC) tissues and 7 non-tumor liver tissues were downloaded from the NCBI Gene Expression Omnibus database (accession number: GSE97332).mRNAs expression matrix of 424 liver samples (374 tumor tissues and 50 normal tissues) were downloaded from the TCGA (https://www.cancer.gov/tcga).Each dataset was imported into the RStudio (v1.4.1103) environment with R (v4.2.2) for pre-processing.circRNAs identifiers have been converted according to the nomenclature described by circBase (i.e.3,416 unique circRNAs).The circRNAs microarray expression matrix was restricted to 2,222 differentially expressed circRNAs with an adjusted p-value (BH) < 0.05 by using the limma R package (v.3.50.1) (Ritchie et al. 2015).mRNAs counts expression matrix was reduced to protein-coding genes, using the annotation file provided by using the Ensembl database and the Biomart web tool (Howe et al. 2021, https://www.ensembl.org/index.html).As mRNA expression data was already logtransformed, we performed a conversion to raw data (2^x-1 transformation) in order to use them with the DESeq2 R package (v1.40.2) (Love, Huber, and Anders 2014), and we selected 10,760 mRNAs differentially expressed with an adjusted p-value (BH) < 0.05.These different pre-processed expression matrices were given as input to Cirscan.Finally, the miRNAs expression data was directly retrieved from the signature of sufficiently expressed miRNA from liver cancer available in the Cirscan tool, as no miRNA expression dataset was generated by the same authors.

TargetScan performance evaluation
To demonstrate the efficiency of TargetScan predictions over random predictions in the context of sponge mechanisms, we constructed random miRNA-target prediction databases.This involves permuting the values of criteria related to TargetScan in databases, such as S Affinity , S NbMRE and S EnrichMRE for the circRNA-miRNA interaction database and S Affinity , S NbMRE for the mRNA-miRNA database.We performed 500 permutations and applied Cirscan to our two benchmark datasets (public colorectal cancer and hepatocarcinoma data).Our analysis focused on the enrichment of published mechanisms for these two cancers among the top-ranked sponge mechanisms identified by Cirscan.To assess statistical significance, we compared the distribution of enrichment p-values (-log10) of the 500 permutation results with the one obtained using TargetScan (described in the main text).As shown in Figure S2, the enrichment p-value (-log10) obtained with TargetScan predictions is greater than the 90th percentile of the distribution of 500 p-values (-log10) obtained with random prediction databases for both cancers.It is important to note that the sponge score calculated by Cirscan takes into account multiple criteria, some based on Targetscan predictions but also others based on expression levels.The final score is therefore a balance between all these criteria, which can explain significant p-values of the benchmark analysis using randomly generated target-miRNA pairs (Figure S2).However, it is noteworthy that these p-values remain notably less significant than those obtained using the true Targetscan predictions.Overall, this analysis shows the relevance of the predictions identified by TargetScan and the importance of this criteria in the identification of sponge mechanisms identified by Cirscan.

Figure S2 :
Figure S2: Distribution of the -log10 p-values of the enrichment analysis of the validated sponge mechanism (500 permutations on TargetScan miRNA-target interaction databases) for A) the colorectal benchmark dataset, B) hepatocarcinoma benchmark dataset.The blue line corresponds to the 90th percentile of this distribution, and the red line to the enrichment p-value obtained with databases of interactions predicted by TargetScan (as described in the main text).